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ABSTRACT 


Pattern recognition schemes have been concerned, mainly, 
with the problem of identifying aliphaeeieic characters, or 
numerals. To this end adaptive pattern recognition devices have 
been trained to recognize such CHaT ast CPer The topic discussed 
here is the training of an adaptive pattern recognition device, 
Adaline, to mimic the performance of the controller of a plant. 

This study discusses Adaline and the minimum square error 
method of adaption. Adaline is trained by observing the behaviour 
of the controller in numerous situations. The problem of 
coding the information to be presented to Adaline is discussed, 
and finally, a suitably trained Adaline takes control of the 


plant. 
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INTRODUCTION 


The problem of automatic character recognition has received 
a great deal of attention recently. Many schemes have been 
proposed and they can be divided into two main groups, “open 
loop" types and "closed loop" types. Open loop schemes compare 
the character to be recognized with a library stock and make 
a decision on this basis. These schemes are very useful when 
the characters are, essentially, standard--possibly typewritten. 
Closed loop schemes are trained to recognize characters by 
applying as many steer as possible to the machine while 
"teaching" it to generate the correct answers before asking it 
to operate alone. These schemes have been described by such 
names as “linear decision network" or “adaptive linear neuron" 
(Adaline). Their advantage over closed loop schemes is that 
they are able to attempt the classification of patterns other 
than those on which the machine has been trained. 

The latter scheme has application to process control. 
The human operator of a steel rolling mill is presented with 
such data BB mda and temperature of the incoming billet. 


On the basis of this knowledge and past experience he can 
adjust the rolls to produce the desired sheet steel. A pattern 
weceenition device could be placed beside him and be trained 

to recognize suitably coded patterns of information about 

the billet and thus to produce the same "response" as 


the operator in control of the rolls. 
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The present investigation considers a simple relay activated 
plant, which is already controllable, and considers the training 


of a pattern recognition device to produce the same result as 


the relay controller. The block diagram of a stable plant 


is shown in Figure (a), where d(x) is the control signal which 
causes the relay to apply driving power to the plant. If the 
linear control system error is defined to be the difference between 
desired and actual output at any given time, then a convenient way 
of describing the behaviour of a plant is to consider the 


System error and its derivatives as functions of time. 


Pryma Compensation d(x) t PLANT <(k) 
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Out put 7 RELAY 


Figure (a) 
The pattern recognizer is, therefore, presented with suitably 
coded patterns of error, error rate (and higher derivatives 
if necessary) and it is then trained to produce an output 


signa1, alx), close to d (4) . This is indicated in Figure (oye 
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Training involves presenting the pattern recognizer with as many 


typical operating conditions as possible, while adjusting it 
so that g. (x) matches d(x) as closely as possible over the 


entire range of conditions. After the training period the 


compensator can be disconnected as the learning machine takes 


its place. This 1s shown in Figure (c). 
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Figure (c) 


The present study discusses the properties of the pattern 


recognition device (in Chapter I) and a specific method of 


adaption (in Chapter II.) The training of the 
of error and error rate and the control of the 
learning machine are discussed in Chapter III. 


appear in Chapter IV. 
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CHAPTER I 
PATTERN RECOGNITION 


An adaptive pattern recognition device of the type to 
be considered has four essential components: a sensory unit, 
an association unit, a response unit and an adjustment unit. 
The whole device is shown in Figure 1. It is expected of 


such a device that it can be trained to recognize stimuli or 


patterns which are part of the environment in which it is placed. 
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Figure 1 

The sensory unit is a transducer which produces, 
possibly, a set of electrical signals in response to a visual 
or audio pattern. If the patterns which are to be recognized 
are alphabetical or numerical, then they could be displayed 
on a matrix of photo-cells which could, in turn, generate 
positive or negative voltages, depending on the presence 
or absence of an element of the pattern. This is shown in 


Figure 2. 





Figure 2 
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Throughout the remainder of the discussion the characteristics 
of the transducer will be bypassed and the pattern, or input, 
for the association unit will be considered to be simply 
an array of positive and negative voltages of unit magnitude. 
The association unit. is a logical decision element 
which produces an output on receipt of the input pattern. 
It will be assumed to consist of a set of adjustable weights 
of number f\+| when the number of elements of the input pattern 
is N . The N elements of the input pattern are supplied 
to the weights, ,W,,W,,---\N,. The weight, Wo, is called 
the threshold and its input is fixed at ti. The values of 
the weights and threshold are determined by previous training. 
In this study all the weights and threshold will have any 
past experience removed by setting them to Zero before 
@ training sequence begins. If the elements of a pattern 
are supplied to the device, the sum of the outputs of the 
weights and the threshold is called the analogue output, MO . 


This is shown in Figure 3. 4 
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The function of the response unit is to indicate which 
decision has been made on the pattern applied to the sensor. 
In this discussion the characteristics of the response unit 
will be bypassed and the analogue output, Q , will be used 
as an indication of the pattern which has been applied. 

These ideas have been discussed by many authors, Woe. 
ee 45), Widrow has named the device “Adaline” (Adaptive 
linear neuron). 

1.1 The Adaptive Mechanism 

The problem of pattern recognition using Adaline 
could be stated in the following way:- given M input 
patterns, each having nN pattern elements, separate them 
into two classes--some of the patterns being mapped to 
positive values of analogue output and the remainder 
to negative values of analogue output. 

Each input pattern can be thought of as a column vector, 

[X|. » where [x]. - Xi 
Xin 


\ 
mi 
in 
with ij = £1, and [x], the d th pattern of ". If the A th 


pattern, | X|., is applied to the device, the resulting analogue 
A i 


output, OK s is Ou. = W, Xi, tW, Xin to + 5 ig tot We Xin +, 
‘a 
= 2WXi +We 
Ast 
where Wp and W, ; W, »~-- | ees Wr are the threshold and 
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set of weights respectively. To be a competent pattern recognizer 
Adaline must be trained, in some way, to make AL, match ¢; j 
where Ch; is the required or desired output for the A th 
pattern. After training, which entails adaption of the weights, 
it is hoped that when presented with a pattern, | x]. Adaline 
Will make the correct classification by generating an output 
as nearly as possible equal to ab 

if (dj - ai) is defined as the analogue error, &j , 
any training scheme which will make all the ©; small is worth 
considering. The adaption scheme studied here attempts to 
minimize the sum of the squares of the errors for all the 
patterns presented. i.e. the scheme minimizes 2. This 
topic is discussed fully in Chapter II. Other adaption schemes 
are een by Treado |6]|. For each pattern, [XI., the 
analogue error, Dy » iS measured. An equal adjustment is 
then made to each weight. This adjustment is proportional 
to the error, Cj , and is of such a sign as will reduce 
SS and Ze. » Thus the change to the weight, \Nj » at each step 
is given by:- AWi4 = BEL XKAj le 
and for the threshold, No, it is:- AW, = 9& 
where S is a constant and where it will be assumed that the 
weights can be varied continuously. The total adjustment 


after all patterns have been presented once is:= 


AW; = G EEX 1.2 


eats AW, = Sree Cx 


The adjustments of equation 1.1 are made if Q;1is not equal 


i 


We 





to “F e Adjustments could, therefore, continue until the 

analogue error, E&; , 1s zero. Hence the minimum of ra 

found, if training is carried on for long enough, will be 

zero. (See also Chapter II) 

Three aspects of equation 1.2 will be examined ina 
later section: 

1. The speed with which the error, C,; , is reduced as a 
succession of training patterns is presented and the 
dependence of this speed of convergence on G- 

ioe The fact that the error, er in response to one 
of the M patterns, may be too large even though Ze) 
is acceptably small, but not Zero. 7 

Be The fact that the constant, S) » can take on values 
which make Ze diverge. If the constant, S , takes 
the value, Rips , where {| is the number of weights and 
Aisa constant, Mays [7] showed that RS for 
convergence. 

1.2 Separability of Patterns 
Adaline can be considered to be the realization of a 

linear decision Runeotons Input patterns can be regarded 

as sets of points in {1 space and a linear decision function 


is any partitioning of the space by a hyperplane of dimension 


N-| . The pattern recognition problem requires the selection 
of a set of weights which will define an appropriate hyperplane. 
In an adaption scheme which uses the mean square error adaption 


rules the weights defining a separating hyperplane are those 


5. 





which cause 26, and the analogue errors, ©. , for each ° 


pattern, [x]. » to be zero. Hence weights must be chosen to 


Satisfy the following equations: 


XW, Sur XW, + ea + XW; Tue XinW,+Wo = d, 


x, * Xe We fot +X Wi ~ Kan tW, = cl, 


Xi W, on W, + a4 i + Kis Wi #3 Xinha + Wo = Aj 


( 
{ 
\ 
( 


ee ee ee ee 


Xm W, + Any, + Se ah ti + XmarNnt Wo = d., 1.3 
The iterative training Scheme of section 1.1, if carried to 
its conclusion, will yield a set of weights defined by the 
above equations if it is possible to map the patterns, 
CX], ’ (x), --- Lx], »--- rane to outputs of. d, ; eave 
ae 1--- Oy « If it is possible to map all the pafterns, with which 
Adaline is trained, to the desired outputs then the patterns 
are said to be separable. After training has been completed 
with a set of separable Pay VeEnes the ~~ "ne for the 
threshold and the weights are fixed at W ; Wr, w, ee 8 


and the equation of the hyperplane dividing f space is, 


therefore, | is 


KWE + XAWE toe #XWNG tka WS =O 1. 


where X ; X, ~--- Xr are the coordinate, axes of A Space. 
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[If the re th pattern, Ix]. with coordinates, Ke in ne 
3g) 1 space, is now presented, it should yield an analogue 
output, 

OV _ x, We a TOA Aisa 4 Xin Wo +i 1.5 
which is approximately equal to the desired output, di; ° 
Equation 1.5 is the equation of a hyperplane on one side of 
the dividing hyperplane given by equation 1.4%. Hence another 
definition of separability is that a set of patterns can be 
Said to be separable if one group of them lie on one or more 
parallel hyperplanes on one side of, and parallel to, the 
dividing hyperplane and the other group lie on one or more 
parallel hyperplanes on the other side of, and parallel to, 
the dividing hyperplane. If a further restriction is placed 
on the minimum mean square error adaption scheme, the definition 
of separability is even simpler. Consider an example with YY 
patterns (as described in equation 1.3) where R. of them 
have desired outputs of +a and. STE them have desired outputs 
of - rah (ana R +f, =yn.) If the patterns are separable, then 
the threshold and weights, W, : wr wr ,7-- wt can be found. 
Further, after training: and on presenting the R patterns 
to these weights, a hyperplane on one side of the dividing 
hyperplane is defined. If the wy patterns are now presented 
a hyperplane on the other side of the dividing hyperplane 
is defined. Both hyperplanes are parallel to the dividing 
hyperplane and equal distances from it. These ideas can 
be clarified if specific examples in 2-space are considered. 
In 2-space Adaline consists of two weights, W, and W. » and 


a threshold, W.)- In the example illustrated in Figure 4 there 


7° 





are two patterns: 


Xi = | 
xX], re | d=d 


Lx], K\ = | 
Xr = | 





=| .@) of man 
x ix], by er vale 
Figure 4 "aa 


From equation 1.3 the equations defining the weights required 


for separation are: 
W, te \W 3. ole Wo = d 
aN aN ae + Wo = - 
These can be solved only by choosing a value for one of the 
weights and solving for the other two. The equations are 
consistent! but yield an infinite number of dividing lines 
Which pass the point, X20 ’ %, = O 


In the example illustrated in Figure 5 there are three 
: : q 





patterns: 


LX], ™- | d, = q 
Air = | 


LXI,. Ka = | d\=-d 


Xraw= 


[xX], X34 =o | d.=-d 
32 = | 


Pa 7 
x XL, ere 


Ifne rank of the coefficient matrix is equal to the rank 
of the augmented matrix. 


Sic 


ae, 








The equations defining the weights required for separation 


Ww tW, +Wo =4 
Ww - W, tT Wo =—d 


-q 


These can be solved and unique values of Wp : W, ; W, are 
f f fF 
found to be W, =—d ,W = dl , W=d - Hence the equation of 


-W, tW, tW, 


the dividing line is: 
+ —/ =O 
X, +X, -/ 
The two parallel hyperplanes for each class of patterns are 
indicated in Figure 5. 
Two inseparable cases are now considered. In the case 
which is illustrated in Figure 6 there are four patterns: 
X,, = | = 
[x] W dj=d 
Xin = | 


anes Ss 
x], te pera 


Ix], 7 d,=-d 


X32 = 





Ch =| __ 
Ix], , d,=-d ( 89 
42 x Cx], .&],,20, 


Figure 6 


The four equations to be solved for the weights are: 
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W, +W, +Wo =a 
W, -W, +Wo =-¢| 
-W, +W, +w, = 
-w, -w, tW = -d 
and they are inconsistent“. A dividing line cannot be placed 


K\ 


between them and satisfy the given conditions. 


Another inseparable case is shown in Figure 7. There 


are four patterns: 


[x], Au = cl = a 


pee 


LX], Xai = | d,=dl 


ty eee 


[xX], Xn=! dy=-d 
no 


IX], XI = = ca = —c| © [XJ,, [x], 
Xar* | x [X],, [x], 





Figure 7 
The resulting four equations are inconsistent and cannot 


be solved for the weights, It is obvious, from Figure 7, 
that one line cannot separate the patterns. 
It should be emphasised that the examples discussed use 


minimum mean Square error adaption. Since this scheme requires 


The rank of the coefficient matrix is different from 
the rank of the augmented matrix. 
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_canvergence to a desired value of output (with the accompanying 
precise positioning of the hyperplane) it makes complete 
‘separation of apparently separable patterns (such as shown 
in Figure 6) impossible. Other "less precise" Selene, as 
discussed by Treado| 6 | » do not encounter some of these 
difficulties. 

The problem of separability will be mentioned again in 
Chapter III where coding is considered. 
1.3 Experimental Simulation of Adaline 

A computer program Was written to simulate Adaline andto test 
its ability to separate pattern sets. Two pattern sets were 
applied: a group of five 'C's was to produce an output of 
+10 and a group of five 'T's to produce an output of -10. 
The patterns and resultant inputs to Adaline are shown in 
Figure 8. During the training phase one of the patterns was 
presented and an appropriate adjustment made to the weights. 
Each of the ten patterns was then presented and the error, ©; , 
in each case was measured. Ze, was then calculated. This 
process was carried out 100 times. The value of the constant, 

g » in equation 1.2 was initially chosed to be 0.05. 

A graph of Ze, against number of adaptions is shown 
in Figure 9. meee has been called = "learning curve" by 
Widrow [5] - It indicates how many presentations of the 
patterns are necessary before Adaline is able to recognize 
all of the patterns with small error. From the learning 
curve shown it can be seen that the Suanest yc ee has 
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Patterns mapped to +10 
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Figure 9(a) 
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dropped rapidly, after 20 adaptions, from the large initial 
value of 1000 to a value of 47. This might seem acceptable 
but the use of Ze; as a criterion is dangerous Since it may 
be generated einest entirely by one or two unacceptably large 
errors. Training can be considered complete only when all 
the errors are zero or, if this requires too many adaptions, 
when the error for each pattern lies within an acceptable 
Limit. 

The five 'C's and five'T's were used again as test 
patterns to find the effect of the constant, , on the rate 
adaption. Values of 2E , after various numbers of Luerat loner 
plotted against 3 are shown in eae 10. The “best" value 
of G » that producing the smallest Ze; » after a fixed number 
of iterations, is found to be % 0.06. If ge %/ | where “is 
@.constant and Y\ is the number of weights, the best value 
of R is 1.02 for an Adaline with 16 weights. 

Learning curves for other values of cS are shown in 
Figures 11 and 12. For small values ef iS the adjustments 
are small and 2; is reduced with few fluctuations. When 
S) approaches the maximum permissable value, the adjustments 
are large and Ze; experiences large fluctuations before a 
suitably Woaareline is approached. For values oo & greater 
than 0.1167 the adjustments are too large and Ze: diverges. 

The problem of large analogue errors ee pac by what 
seems an acceptably low Ze: has been mentioned... In all 


cases examined, however, no individual error was ever 


prohibitively large. Figure 13 is a table of values of 
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analogue errar corresponding to a value of FC, for various 
Awl 
values of 4G and various numbers of iterations. 
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Figure 13 























If trainingwere continued indefinitely, the errors, ©; and 
Ze, could, presumably, be brought close to zero. This is 
oe practical, however, and a more realistic approach is 
to limit the number of iterations and accept an analogue error.~ 
Here the concent of a "dead zone" could be introduced. Thus, 
if the desired value of output is +10, a tolerance is placed 

on this value so that if, after training, a pattern produces | 
an output of, say, +10 + 3 it mould be classed in the +10 


group. In addition to making an excessive number of training 
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iterations unnecessary this may enable Adaline to recognize 
patterns which it has not seen during training--such as a 
training pattern which has been contaminated by noise. If 
Adaline is used with. threshold device, two modifications 


could be made. Consider the Adaline shown in Figure 14. 





Figure i4 


The threshold device yields an output of +1 if its input is 
greater than +7 and an output of -l1 if its input is more 
negative than -7. Only the lower half of the dead zone, or 
tolerance, is used here. An analogue input of 10 to the 
threshold element is aimed at but, in this case, if values 

of output, GQ, , are greater than 10, adaption to reduce 

this to 10 would seem pointless. A new training scheme would 
be to adjust the weights (by the old rule) to make the output 
approach 10 from below. If the output is greater than 10 


no adaption should take place. 


We 





CHAPTER II 
A NOTE ON MINIMUM SQUARE ERROR ADAPTION 


There are many methods of approximat {ng a polynomial by 
a.straight line or-curve. One common method is described by 
_ texts on numerical. analysis C3 | and receives the name 
“minimum square error approximation." If the curve shown in 
Figure 15 is a polynomial, pi), and is approximated by a 
straight line, Lx)= aX th » then the constants, QU and b , 
can. be found. vee which L(x) oe eS es minimum mean square 
error criterion. i.e. The sun, pe -e +e, Hane +e “Kos. +er 
where = P(Xi) - A(X.) » must be minimized. If F= zeh 
then the values of QU and b which minimize fea can be found 


using the techniques of differential calculus. 
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plx) 
4 (x) 








x; 





Figure 15 
Consider the problem of training Adaline. Patterns, [x]. 


[x] yrc---- (x) have desired outputs of d, ae ---- Cin 0 
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At any point during the training phase, when the output 
of Adaline in response to these patterns is C., “ Qs »--- Am: 
the errors are ©, , C, --- Ew. Adaption requires the 
adjustment of the threshold and weights in such a way as to: 
make Q, “ Q, ,--- Oy, approach dd, . Cy: =-= Cn By carrying 
over the concepts of polynomial approximation the sum of the 
Squares of these errors can be found and, by using thts as 
a criterion, the weights can be found which minimize Ze 
20.1 Analytical Minimization of ze; | 
Before training Adaline is acini hl to have the threshold 
and weights of value, \VNo , W, ' W, ia wise Wy « Lf 
the MW training patterns are now presented the outputs are 


given by the following equations: 


Oy = X\Wy + X_Wyte +Xiy Wyte +XinWy + Wo 


( 
( 


Qs, = aT 1 Maat Bis 5 2 ia *XonWn “— 


Qin = Xan W i Kn Wi Ka Wy 4. 


The expressions for the errors are: c= Cig CL 
€, 7 ae Ar 


mM 
The sun, 2 6; , 1s formed and this must be minimized by 
mf 
adjusting - threshold and the weights. If Ze, is calleaS, 
then 


2: = a 
S = (dy @,) + (dy- Ag) teres + + inde 
Analytically, the values of the threshold and weights which 


yield the minimum value of S can be found by taking the 


Zl. 








partial derivatives of S with respect to the threshold and 


weights and equating the derivatives to zero 


ae. 
os 
QWo 
eS =0 
OWp | 
9S 20 
3W, 


There are Nt/ equations and Mt/ unknowns and, if these 


’ 
equations are consistent, they can be solved for wr ; W, : 
a F 
W, 977-7 Ww, --the values of threshold and weights which produce 


minimum S The equations are: 


(dy-0,) + (dandy) te) (daz) ts +(dn “Am) =O 
(a, ~a,)X + Cd, ~Oa)Xai +(d; A) Xp, +0" (daa= An) Kon O 
(2, )xiy T (0. Xo : (di aa)Xy4 Pe + (dA) Ky ’ O 


(dy) Xi (dana) Kant? +(d,-a;) Kitt (dain) Kin 


They can be expressed compactly, with aa th equation given 


by? oyu =o or ZeiXis = CO valid for a= Je nN 


C 


and I%w_ =O hie a =O for 4 =O 


It is instructive to consider a few examvles of this 
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technique as applied to patterns in two space. For such patterns 
there will be three equations, with three unknowns, to be 


used to find the best weights. i.e. 


WM 
Ze: = O 


A= | 


2.62% =O 
A =| 


2% Xiz =O 


In the example illustrated in Figure 16 the patterns are: 


x], MF dyed 


x, 27 


XM, ual a ag 
X22. = | 





Figure 16 


\ 


The errors are given by: 
QA = d- Wy +W2z - Wo 
Ce, =-d+W, -W,z —We 
and the three equations are: 
Wo = O 
W,- Wz= d 
W, = Wa = d 
These equations yield an infinite number of solutions for 


the weights so that the weights define an infinite number 


& 
of hyperplanes which pass through the point A*0,hO, If We is 


Ze 








r 
chosen to be d » then W, = 2d and the hyperplane has 
equation, Ky =2X,. This ig the same result as was obtained 
in example 1 in section 1.2. 


In the example shown in Figure 17 the patterns are: 


[Xx], *u 7 | dad 


\2 


(x), Ka - Chis = -¢| 
X2277| 


[xX], Ka) = 7! d,=-d 
Xar= | 





Figure 17 


The errors are given by: 
ce, = a = W, —-W, —Wo 
S.: =-c\ mea W, 1 WwW, avy 
e. = -d +W, — W2 ~Wo 


and the equations to be solved for the weights and threshold 


ATE $ 


W, aN Me SNP - ¢l 
3W, — W2 + Wo = 
W, — 3Wz + Wo =-d 
Solution yields: \Nt =-d, wr = oan wr = ¢l and 
hence, the equation of the dividing hyperplane is KA=|-A\. 


Again this agrees with the results of example 2 in 1.2. 


one 





In the example shown in Figure 18 the patterns are: 


[X], Xu = : | \ "s 
Xia : 
ix], patie d,=-d 








X22. =| 4 
XK]. si - cl, =-cl Aypeeptc® 
X32 7 x =| x 
IX], XK 4\ ae area 
A o 


Figure 18 


The equations for the errors are? 
= = Cl a Wi - Ws a 
7) = -() - W aS ave 
C2 = -< +W, 7 Wa —-We 
e = —c| +\W, +W,-Wo 


and the three equations to be solved for the weights are: 
A\N, + 2d =O 
4\, -2d =O 
- 4Waz -2d =O P 
and so we = ~A/n , W' = d/o. : \N, = dy. . The equation 
of the hyperplane which separates the two classes of patterns 
is Xo =-K, a 2 | o This avparent contradiction of the 
result of example 3 of section 1.2 can be explained in the 


following way. Minimizing S analytically results in finding 


Zoe 





the particular hyperplane which will sevarate the patterns 

into two classes in such a way as to minimize 265 5 The 
equations of section 1.2 describe an adaptive “eae which will 
separate patterns with a ISSEY Ce Wek hyperplane only if the 
error, 3 , for each pattern and Ze: can be reduced to zero. 


eh 
Az 


Finally, in the example shown in Figure 19 the patterns are: 


Ay, “ur! aed 


Xi2 = | 


[x], Mut dad 


2 X22271 





[x]. Agi =| d,=-c 
X42 27! 


Figure 19 
Pe we 8 re 
The three equation for the weights yield W, = 2 = W, = © 


9° 


This means, in effect, that no hyperplane exists which will 
separate the patterns. This agrees with the result of example 4 
Hieseclion lewid. 
2.2 Steepest Descent Minimization of Ze. 

If the function, 5 26, has a minimum it can be found 


A= 
by an iterative procedure, one possible method is described here. 


ZO 





If the threshold and weights have initial values, \W, , W, 
We »7---- Wr » and these weights do not define the minimun, 
then all the patterns, [X]_ ; [x], »--- LX] ¢>---DK ee are 
presented. The function, Ss , and the M+| partial derivatives 
are calculated. The method of steepest descent involves 
evaluating the magnitude of the gradient vector at the point 
defined by Wo » W,. Wy »---Wi--- Wy, and descending a short 
distance along the gradient vector towards the minimum of 
by adjusting the weights, Wo ; W, »---- We . The a0. esere 
made to the 4 th weight is proportional to OSiyyjand is AW = = Ze. x Aj 
The adjustment to the threshold is AW, =9Ze; . 4 is a 

cS 
‘constant of proportionality and affects the size of the increments 
to all the weights and the threshold. It should be noted 
that the increments, AW, : AW, :7-7-- Aw; ,7--- AW, » are not 
necessarily equal. At each new set of values, Wy »W, »7-- 
Wi, »--- Wh the patterns are again presented and the process 
is repeated. This process is continued until the minimum 
is found. A memory is required to implement this process. 
After RREER DEINE each pattern the error, Siu and the f\ elements 
of the A.th pattern must be stored until all the patterns 
are presented so that S , AW, ana AW; can be calculated. 
The calculation of AW, , DW; could be accomplished using 
N+ | totaling registers to collect Ze LEX» ---- BE; Sn 
It is for this reason that a modified form of steepest descent 
(which does not require a memory) is used when training Adaline 


to find the values of weights which yield a minimum for S 
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2.3 Modified Steepest Descent 


The sum of the gquares of the errors, S » 1s given by: 


me 
> te 
The adaptive scheme described in this section attempts to 
find the weights which yield the minimum for Ss without the 
2 
need for a memory. The first pattern is presented and Se, 
is calculated. This is treated as the function to be minimized 
and the steepest descent method is applied to this. Calling 


pee tay 
1, =F =(d-a), the derivatives are: 


ON ae = 2(d,-4) 
AN a 2.(44- 41) X, 


fiw, = 2 ( 4,—a,) Kin 


It should be noted that the derivatives are equal in magnitude. 
As was mentioned in section 2.2, the method of steepest descent 
involves making an adjustment, which is proportional to FAW 
to each of the weights. i.e. AW, = 9e\Xin» and the adjustment 
to the threshola is AW,= ge, » where 2 is a constant. 

The adjustments, AW, » AW, 9-7-7 AW,» are equal in magnitude. 
These adjustments are made and then the second pattern is 
applied and r, -€ is calculated. Equal adjustments are 
then made by applying one step of the steevest descent proceedure 
to Tf, . This process is repeated until the YY\ patterns 

have all been been presented. If the minimum has not been 
found presentation of the patterns and adjustment continues as 


described. The ean steevest descent method uses the function, 9 P 
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as the criterion. The criterion for the modified method, 
however, changes from fe to % ----to ft; a 7 Liane 
constant, So » is such that the adjustments are very small 
then Sw +f ee, » where 7 -6 and is evaluated 
after, -1 iva ne minimum of S can be found 
by continuing to apply this method. In many cases the true 
minimum (as obtained by the analytical method of examples 
1 and 2 of section 2.1) is found but in some cases (as in 
example 3 of section 2.1) no hyperplane defining the 
minimum is found. The hyperplane oscillates around a mean 
plane which would define the minimum of SS 
The adaption scheme can be defined as follows: 
Present the patterns in turn and, after each pattern 
is presented, adjust all of the weights and the 
threshold by an equal amount in a @direction shen 
that the error will be reduced. oo OSU is to 
wane place of the error, on application of a pattern, 
ils not Zero. 
This scheme is most frequently called "Minimum Square Error 
Adaption" in the literature. The main difference between 
"Minimum Square Error Adaption" and the method of steepest descent 
is that in the former adjustment is made after each pattern is | 
presented whereas in the latter, all the patterns are presented 
before any adjustment can be made. 


3 In "real" steepest descent Ss would be given by S= f+ tert 


where f' , ££ ,--- £,, were evaluated before any 


adjustment was made. 
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CHAPTER III 
ADALINE IN CONTROL OF A PLANT 


3.1 The Equations of the Plant 

In this chapter Adaline is taught to recognize the 
behaviour of a stable plant and to produce the correct control 
Signal. The plant chosen is a second order system which 
consists of a motor and a load. The power supplied to the 
motor is controlled by a relay. The system is stable by 


virtue of unity and velocity feedback. It is shown in Figure 19. 





dis) = ris) —KeS ¢ls) — e(s) 


Figure 19 


The differential equations of the plant can be obtained, 
as a function of time, from the block diagram. Hence the 
output, C(t), and the output rate, C(t) , can be calculated, 
as functions of time, for various initial values of <(t) 
and é(t) -~rather than supplying the plant with various desired 
outputs, Y(t). The voltage which is applied to the plant 
UE as V voit, depending on the sign of the relay input A(t). 


From Figure 19, 


d(t) = rt) — ra BS — ke eA) se 


30 





and choosing to set r(t) = QO » the switching condition for 
the voltage, V » 1S given by: 
c(t) — wo) 3-2 


The equations for output and output rate are: 


| -bt 
c(t) = Kt +(C, +¢,- x) +(K-C)e Bs 


é(t) =K—- bik -8\e™ 3.4 
where C, C(0) the initial value of c(t) IE O ‘ 
C, C(0)cnedinttial vaiueler c(t) a 
and K = tkV. 


A convenient way of presenting this information is to use 


the phase plane. If the system error, e(t) » Ls defined as 
e(t) - r(t) —c(t) , where r(t) is the desired output 
and e( 2) the actual output, then a plot of error, e(t), 
versus error pave; OMay is called a trajectory in the phase 
Plane. Since “inputs" are applied to the system as initial 
conditions of c(t), C(t) with r(t)=0 then a(t) - — c(t) “ 
é(t) - ~C(t) - Hence equations 3.3 and 3.4 define a 
trajectory in the phase plane which describes the system 
behaviour for a given set of initial conditions. since K. can 
be positive or negative, depending on the sign of the input 
- the relay, a(t) » the trajectory obeys two equations, A and 
B. this is indicated in Figure 20. The switching condition 
has been stated in the equation 3.2. Phis is a line with 
a stove, /K, » in the phase plane. The values chosen for 


the control system were: Ky = 0.2 , kK = +10 , and 


Ole 
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Figure 20 
Using the analyticalsolutions for c(t) and Cie) a computer 
program was constructed to produce sample trajectories in 
the phase plane for comparison with the attempts produced 
by Adaline. One of them is shown in Figure 21. 

The objective of training Adaline is to enable it to 
recognize combinations of e(t) and @(t) a order that it 
may produce a. response, a(t) » Which is plone to the correct 
input to the relay, d (+) eo During training it is desirable 
to present many combinations of e(t) and e(t) » so that when 


Adaline later acts as a controller it will have enough 


"experience" (or have. seen most of the phase plane) to 
make a correct decision and produce a good value of a(t) 
to drive the relay. To train Adaline ina practical 
situation various initial conditions, <(o) and é(o), would 


be set and the resulting trajectories would generate values 


a2) 
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of € (t) ) é(t ) and d(t) » Which would be presented to 


Adaline. This is illustrated in Figure 22. It is unnecessary 





Figure 22 
to follow this procedure exactly in a digital computer 
Simulation. Since it is known that the input to relay, d(t) 
is given by d{ (t) = e(t) +k e(t) everywhere in the phase 
plane, Adaline is, in this study, presented with a large 
number of randomly selected points with coordinates,  (t) 
and Bt): and is trained to produce an analogue output, alt). 
of value as close as possible to a(t) for these points. | 
3.2 Coding of the Phase Plane and the Training of Adaline 

A practical Adaline is limited in size by having a 
finite number of weights and the number of weights limits 
the number of patterns which Adaline can attempt to separate(q ]. 


When the control system responds to a set of initial conditions, 
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the error, @(t) ; “and error nay FSKH) » vary continuously 
until a steady state condition is reached. If the values 
of e(t), é(t) and A(t) are sampled frequently, patterns 
could be assigned to the points with coordinates, e(+) 
and E(t) » in the phase plane. This would yield a very 
large number of patterns and an Adaline with a large number 
of weights would be required. Although, in this study, the 
values of e(t) ana C(t) were sampled frequently, the above 
Situation was avoided by dividing the phase plane into a 
small number of regions to which certain patterns were 
assigned. This ensures that only a small number of patterns 
will be presented to Adaline. It was also necessary to 
choose the patterns, or codes, in such a way that it will be 
possible for Adaline to separate groups of the patterns into 
the appropriate two classes. If this last requirement is met, 
it is said that the patterns are linearly separable. 

A linearly separable code which has been used by Widrow [4] 


is indicated in Figure 23. The variable, K » which can take 


Q ib iC d £ 
mm CODE 
AaSxXx<b Ooo! 
bS$xXx<c Slomre 
cé xX <d 0100 
d< xX <f 1000 
Figure 23 
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on an infinite number of values between Q and + - 

is coded by choosing a reference or origin at Q® on the X axis 
and dividing the region to the right of this reference into 
segments. The total number of segments determines the number 
of digits in the code. The code assigned to each region is 
obtained by moving the digit 1 one space to the left as the 
variable, X » moves from one segment to another. This ensures 
that there is a difference of two digits in the code between 
any two regions--which may be a factor in explaining the 
separability of the code (Treado[6] ). 

The phase plane can now be divided into squares, for 
example, and a pattern assigned to each square. One method of 
coding e(t) and e(t) separately has been discussed. If this 
were done, a group of points in a square in the phase plane 
could have a coded value for abe and a coded value for e(t) ° 
A pattern which could be associated with points lying within 
a given square in the phase plane can be obtained by placing the 
coded values for elt) ana é (t) Side by side. This means that 
if e(t) has a value such that it is in segment 0001, and e(t) 
has a value such that it is in segment 0100, the pattern for 
the point with coordinates, e(t) vel) in the given square 
is [eé| or 00010100. 

In all the tests described here the possible initial 
conditions for the control system were limited so that the 
maximum absolute value of elt) and &(t) was 10. This section 


of the phase plane was then divided into squares. By choosing 
a reference at elt) =-|O , é (t) =—|O , e(t) ana G(t) were 
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suitably coded and hence a pattern was assigned to each 
Square. A division of the phase plane into 25 squares is 


Shown in Figure 24. The eo aae with which the phase plane 
0000 
10000 
O1000 
01000 
200/0 
900/0 
©CoOpHOOCOCIO i 00/1010001/0000 
ee Fe off 0000}}/0000| 


-/o\ 


Figure 24 





-~/O 






is divided determines the number of digits per pattern and 
this PAGS the number of weights in the Adaline. 

If the Gc) and é(t) axes are divided into (1 sections 

each, then the pattern for each square rae 2" digits. It 
Can be seen that the size of Adaline increases linearly with 
the fineness of the division. The more alarming feature is 
that the number of patterns increases with the square of 

the sections, n* » and the question arises as to enetner 
Adaline is capable of separating ne? patterns into distinct 
classes when the number of weights is an. 

By using various sizes of grids on the e(t), é(t) 
plane, several Adalines were trained (according to the Mean 
Square Error sonene of Chapter I) to "duplicate" the performance 
of the control system of section 3.1. As in Chapter I, each 


of the zeros in the patterns was replaced by the digit -l. 
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A pseudo-ramdom number generator was used to produce values 

of e(t) and é(t) which were then coded. The function, d(+) A 
was calculated from equation 3.1 and, together with the 

coded values of e(t) and Elt) » presented to Adaline. 

After each pattern was presented the weights were adjusted 
and the next pattern read in. After a suitable training 
period the weights were fixed and Adaline was tested with 

one pattern from each square of the grid. 

The output of the trained Adaline, alt) , 1S positive or 
negative--depending on which pattern is presented. If a line 
is drawn in the phase plane separating squares for which a(t) 
is positive from those with negative alt) , this can be 
called the switching line which Adaline has produced, and 
the training of Adaline can be Pensideed as the training of 
a function generator. 

Several points were examined using the scheme which 
has. been outlined: 

1. The effect of the adjustment cons tari G) 

the switching line which is obtained when the 
training process is complete. 

2. The effect of the number of training cycles on 

the final switching line. 

3. Convergence of values of weights and threshold 

as training proceeds. 

4. The capacity of Adaline. 

These four points were examined using a digital computer 


Simulation of the coding scheme and Adaline as described 
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in the previous paragraph. A training cycle consists of 
fixing the adjustment constant, 9 » and presenting 
Adaline with patterns generated by a fixed number of 
points in the phase plane. The training cycle is repeated 
for various numbers of points and various 9 ° 

Assuming that Adaline is trained with a very large 
number of randomly distributed, points in the phase plane 
the switching line which Adaline will produce, when trained, 
can be predicted. 

For a grid of 16 squares, and hence an Adaline of 
eight weights and a threshold, Adaline's attempt at 
duplicating a switching line of slope, -5, is shown 


in Figure 25. For points occurring in all the squares, 
AC 
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Figure 25 
except those numbered 2, 6, 11 and 15, the input to the 
reay, d(t) , is either positive for all points in a square 


or negative for all points in the square. There is no 
difficuity in training Adaline to differentiate between 


such squares. Points in squares 2, 6, 11 and 15 can, 
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however, present Adaline with conflicting information 
during the training phase. In square 2, for example, all 
points, when coded, yield the same pattern, 00101000. Since 
the switching line passes through this square the points, and 
hence the pattern, can correspond to both positive and 
negative values of c(t), Adaline has, therefore, the problem 
of placing the same pattern into two different classes. Using 
the assumption of a statistically uniform distribution of 
points it can be seen that Adaline will see more values of 
negative d(t) than posrei vey alt) during training. 

After training, therefore, the weights and threshold 
Will have values such that the pattern of square 2 is 
Classed negative. The same argument applies to squares 6, 11 
and 15. It is on this basis that a prediction of Adaline's 
Kee at producing a switching line is made. 

For a grid of 25 squares, and an Adaline with ten 
weights and a threshold, the prediction of the switching 


line produced, after training,.is shown in Figure 26. 
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In this case the weights and threshold, after training, 

should be such that squares 3 and 8 will yield positive 

output and squares 18 and 23 negative output. The output 

of Adaline in response to a pattern from square 13 is uncertain. 
Of the patterns presented to Adaline from square 13, statistic- 


ally, half of them would correspond to positive values of d(t) 
and half of them to negative values of c(t). 


For a grid of 36 squares and an Adaline of 12 weights 
and one threshold, the prediction of the switching line 


produced by Adaline, after training, is shown in Figure 27. 
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From previous arguments it can be seen that, for squares 9 
and 15, Adaline will produce a negative output after training 
and, for squares 22 and 28 Adaline will produce a positive 


output. Since the actual switching line bisects squares 3 


and 34 the output Adaline will produce, after training, when 
presented with the patterns of squares 3 and 34 is uncertain. 
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Examples of the switching lines actually produced by 


Adaline, after training, are shown in Figure 28. 









jt a 
kos weights. 
Figure 28 





(C) 


Variations of these switching lines were produced by using 


different values of the adjustment SOE eS era » and with 
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different numbers of training cycles. The four points 
mentioned earlier in the section are now discussed. 

The size of adjustment can be considered as affecting 
the position of the switching line in the following way. 
For values of close to or less than the optimum value 
(as discussed in section 1.3) the switching line produced, 
after training on a large number of points, is very close 
to the predicted line. Variations occurred when the values 
er. g were greater than the optimum value. When the 
adjustment is large, the last few patterns presented before 


training is stopped will have a disproportionately large 
effect on the weights. The last few patterns may not be 


distributed evenly among the squares and hence Adaline may 
be biased and produce a peculiar switching line. Two examples 
are shown in Figure 29. When the size of adjustment is 


small this effect is less. 
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The effect on the switching line of the number of points 
presented during training is Similar to that due to different 
values of constant, g » but is not so pronounced. The 
Squares which are affected most are those for which, during 
training, there are both positive and negative values of d(t). 
The total number of points presented determines the number 
which occur in @ given square. Although the random number 
generator produces a statistically random spread of points, 
it may be cut off at a stage when there has been a distinct 
bias toward one region of the phase plane. The number of 
points presented during testing varied between 100 and 1000. 
In the case of a grid of 16 squares the switching line is 
unaffected. In the grid of 25 squares, square 13 of Figure 26 
yields outputs both neaative and positive--depending on the 
number of training points. In the case of a grid of 36 
Squares the outputs produced by squares 3 and 34 oscillate 
between positive and negative values. 

When examined during training the weights and threshold 
are seen to oscillate about a mean value. Given a value of 
less than or equal to the optimum value, the weights and 
threshold oscillate around values which yield a switching 
line close to the predicted switching line. The oscillation 
is a result of the presentation of conflicting information 
from squares cut by the switching line of the plant. 

Widrow [9] has stated that "the statistical capacity 
of Adaline is twice the number of weights." By this ‘rule’ 


the Adaline with eight weights would be able to classify the 
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16 patterns which it receives but the 10= and 12-weight 
Adalines would have sapeaemne. in attempting to classify the 
25 and 36 patterns which they receive. The results obtained 
Waawicate that this does not appear to be the case. In an 
attempt to discover if there is a limit eomene number of 
patterns which Adaline can separate, Adalines with 14, 16, 
18, 20, 22 and 24 weights were trained. Several thousand 
points were presented to each Adaline during training and, 
in all cases, the switching line produced after training 
was very close to that predicted. Variations occurred only 
when Squares were cut by the switching line of the control 
system. Since an upper limit did not appear (an Adaline with 
24 weights separated 144 patterns into two classes) further 
Study of Adaline's capacity is indicated. 
3.3 Adaline Acting as the Controller 

A digital computer simulated Adaline was now placed in 
control of the simulated plant described in section 3.1. 
The desired output, (t), was set to zero, and ‘inputs' were 
applied by setting initial conditions, c(0) and (0), at 
the output shaft. Trajectories in the phase plane were 


obtained for various initial conditions. 


A block diagram showing Adaline connected to the plant 
is shown in Figure 30. The weights and threshold chosen 
for these tests were those which gave switching lines as 


close as possible to the predicted lines of Figures 25, 
ZO mere 
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The essence of computer calculations is as follows. 
Phe initial conditions c(o) ; é(0) are presented to the 
error, error-rate calculator and yield e(0) ; e(o) re Ab ale: 
values e(0) , €(0) are presented to the coder and time, fe , is 
set to zero. A pattern, which depends on the square in which 
the point with coordinates, e(t) , e(t) , lies, is generated 
and applied to Adaline. Adaline produces an output, Q, and 
this causes the output from the relay to assume a definite 
Sign. Using equations 3.3 and 3.4, and a small increment of 
time (10ms), c(t) and e{t)are calculated. These values define 
the second point of the phase trajectory. c(t) and é(t) 
are now applied to the error, error-rate calculator and the 
calculations are repeated. The values of e(t) and é(t) 
are stored for printing and for drawing a graph of é(t)against 
e(t) --the phase trajectory. The input to the relay, ,(t) 9 
is examined after each increment of time. If it has changed 
Sign, the trajectory has crossed the switching line and the 
sign of K in equations 3.3 and 3.4 is then reversed. 
Time is set to zero and the initial conditions, C(o) ; é (0) ’ 
assume the values of c(t)ana C(t) that existed immediately 
before switching occurred. Since the increments of time are 
small, these values. of c(t) and Ct ee close to the values 
at the time of switching. The calculations described are 
continued until it is clear that the trajectory is either 
converging to zero error or to a steady state value of, error. 

In one test, an Adaline with elght weights and a threshold 


Was used to control the plant. The values of the weights and 


{ 
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threshold were those which yielded the switching line of 

Figure 28(a) and are given in Figure 31. This is the switching 
line of a relay operated plant, with no velocity feedback. 

The system in Figure 30 should exhibit the same response. 

The trajectory in response to initial conditions, c(0) = -& ’ 
é(o) = -? is shown in Figure 31. The response is highly 
oscillatory, but stable, and the error is, eventually, reduced 

to zero. A similar response would be obtained for other 
initial conditions. 

In considering an Adaline with ten weights and a threshold, 
corresponding to the switching line of Figure 28(b), it can 
be seen that the two vertical portions of the suilvonine 
line wail reduit in the relay switching early in the course of 
a trajectory, so that, after switching, the trajectory will 
tend towards the origin. The horizontal portion of the 
Switching line, however, causes trouble. Consider a trajectory 
which crosses the switching line. If it is trajectory B of 
Figure 20, then the relay will change sign and the trajectory 
Becomes trajectory A of Figure 20. Trajectory A then crosses 
the switching line and switching again occurs. This chattering 
will continue until the trajectory crosses the vertical line. 
This is best illustrated by Figure 32 for which the initial 
conditions are c(0) = 8 ; é(o) = 2 . The final solution 
is damped but settles to a steady state error of 2. Tests 
were then run with many values of initial conditions (see also 
Figure 33) but the main features of the trajectories were 


always as described. It should be noted that all trajectories, 
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except one, will result in chattering and a final steady 
state error of 2. The exception is illustrated in Figure 34 where 
the initial sondiitiers are <(0) = -& ; E(0) = TZ o Lv can be 
seen that switching occurs early, and, after switching, the 
trajectory tends towards the origin. The trajectory continues 
past the e(t) axis, meets the horizontal switching line, 
Chatters and, finally, settles to a steady state error of 2. 
There will, however, be a trajectory, close to that described, 
which would arrive at the origin after the relay has switched 
once. 
Tests were also made using an Adaline with 12 weights 

and a threshold, correspvonding to the switching line of 
Figure 28(c). As in the example with 10 weights and a thresh- 
Old, the two vertical portions of the switching line will 
cause early switching and the two horiwoneal Sonti one are so 
placed that the relay will chatter until c(t) and é(t) reach 
values such that the trajectory crosses the central, vertical 
line. The trajectory will, in all cases, settle to a steady 
value of zero error and error rate. This is illustrated in 
Figure 35--for which the initial conditions are c.(0) = 3) ; 

C (0) = 7 . It can be seen that certain values of initial 
conditions would result in a trajectory switching only on the 
central portion of the switching line. This is illustrated in 


Figure 36--for which the initial conditions are co) =Z; 


Co) =-5. | 


48. 





Finer division of the phase plane would result in 
more accurate reproduction of the desired switching line of the 
controlled plant, but Adaline‘'s attempt would still consist 
of vertical and horizontal portions. The examples given 


show the type of response which can be expected. 
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CHAPTER IV 
CONCLUSIONS 


The work described in Chapter III has shown that it 
is possible for an Adaline to learn to control a simple 
plant. Its ability as a controller is poor, however, and 
in this section methods of improving the basic scheme are 
yo Pee The concept of training a network to perform 
a desired function is also discussed. 
4.1 Adaline and a Dead Zone 

In all cases discussed in section 3.3 the combination 
of Adaline and the plant resulted in a stable control systen, 
but Adaline's control ability was poor in regions near the 
origin of the phase plane--with either lightly Aainped 
oscillations about the origin or lightly damped oscillations 
about a steady state error. One method of eliminating this 
would be to use Adaline in conjunction with a relay with a 
small dead zone. When the Eraqeetorn enters the central 
Square, or squares, Adaline would be disconnected and, with 
the relay in the dead zone, the output shaft of the plant 
would coast. This method would probably yield a better 
response than can be obtained using Adaline with an 
ideal relay. At worst a limit cycle condition could persist 
but, if the central square were small, this could be tolerated. 
In the physical realization of this scheme, a source of 
difficulty would lie in deciding when the trajectory leaves 


and enters the central square so that Adaline could be 
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connected or disconnected. 
4.2 Coarse and Fine Division of the Phase Plane 

Another method of improving Adaline's response near 
the origin could consist of dividing the central square, 


or squares, into the same number of squares as the ma Jor 
grid. The pattern codes of ese - in the large grid sail 
then also be assigned to corresponding squares in a fine 
grid. This is illustrated in Figure 37 for a division of the 


phase plane into 25 squares. If the weights and threshold 
| Adoline 


en ee 
(D« 


Controlled 
‘ Plant . e \ 





\e#z__ Controlled 
\ P lant 


Squares “]. and h have code OoOQlO OO \ecee 
Figure 37 


correspond to the switching line of Figure 28(b), by 
referring to section 3.3, it can be seen that most 
trajectories will now finish with a steady state error of 
2/5, instead of 2, since the grid is now five times finer 


near the origin. This scheme can be realized by altering 
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the coding box in Figure 30. When the trajectory enters 
the central square, the values of e(t) and é(t) are 
amplified by 5 before being applied to the coder. This is 


indicated in Figure 38. As in the scheme of section 4.1, 
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difficulties would arise ih deciding when the trajectory 
enters and leaves the central square. It should also be 
noted that this method is applicable only when Adaline is 
trained on a linear switching line. 
4.3 Polar Coding Scheme 

In Chapter III the controlled plant had a switching 
line which Adaline found difficult to reproduce due to the 
coarseness of the division of the phase plane. One possible 
approach to the division problem, in the case of this specific 
plant, would be to divide the phase plane into regions bounded 
by concentric circles, with their centre at the origin and 
pedis radii extending from the origin. The resulting ‘curved! 
rectangles are those to which patterns can be peeiened 


using a coding scheme such as that discussed in section 3.2. 
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This is best illustrated by considering the example with 
Six angular divisions of , radians and six radial divisions 
in Figure 39. 

If a point with coordinates, e(t) ‘ é(t) > in the phase 
plane is to be coded and assigned a pattern, it is first 
necessary to convert the coordinates of the point into 
polar coordinates, ‘ , PD » This complication is trivial 
as far as a computer simulation is concerned, but if 
hardware implementation is to be considered, the additional 
complexity would make this method of dividing ene ones 
plane less attractive than rectilinear division. 

Using a digital computer simulation, and Adaline with 
12 weights and a threshold was trained on the controlled 
plant of section 3.1. The program was very Similar to that 
described in section 3.2. A pseudo-random number generator 
produced the coordinates, e(t) ,e(t), of a point in the 
phase plane. They were then converted to polar coordinates, 
and presented to the coder. The resulting pattern and dt), 
as calculated from equation 3.1, were presented to Adaline 
and adjustments were then made to the weights. The calculation 
was then Ponea ted for di rterent points. After a few training 
cycles the weights were fixed and the response of Adaline, 
when presented with a pattern ren each square, examined. 
The switching line produced, after training with 700 points 


and an adjustment constant of 0.03, is shown in Figure 40. 
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As has been discussed in section 3.2, the distribution of 
points affects the training--particularly the response from 
patterns corresponding to squares or areas cut by the switching 
line of the plant. The switching line produced, after 

training with 300 points and an adjustment constant of 0.03, 


again illustrates this point Figure Hi). 
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It can be seen that an Adaline adequately trained in this 
manner would be able to control the plant. The results of a 
computer program, uSing the weights and threshold which 
correspond to the switching line of Figure 40 verified this. 
The phase trajectory was identical to that of a second order 
plant, compensated with unity and velocity feedback, and with 
the tachometer constant determined by the slope of the 
switching line in Figure 40. 

4.4 General Remarks 

The main purpose of this study has been to verify that 
a pattern recognition device, if suitably trained, can take 
the place of the controller of a plant. Some of the possibilities 
of control using Adaline, and some of the associated problems 
have been revealed in the course of this work. 

It is instructive to consider the peculiar nature of 
this particular recognition problem and to this end comparison 
is made with a simple form of the weather forecasting scheme 
discussed by Hu [to]. Weather maps, containing information 
on barometric pressure over a wide area are the source of the 
patterns to be presented to Adaline. The weather (wet or dry) 
on the following day at location B is the "desired output." 
The map is divided into squares and the squares generate tl 
or -l pattern elements according as the pressure is higher 
or lower than normal. The pattern, consisting of the array 
of pattern elements and the resulting weather at B are presented 
to Adaline. If wet is assigned to -10 and dry to +10, Adaline 
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is trained to produce the appropriate output. Training consists 
of presenting Adaline with many weather maps, and the resulting 
conditions at B. After training 1s complete, Adaline can, 

given a weather map, estimate the weather on the following 

day. In this recognition problem, given a large amount of 
information (over the area of the map), the response at a 
particular place can be evaluated. In the control systen, 

the recognition problem is the following: given information 


on the error and error rate of the plant at an instant 
(the coordinates of a point in the phase plane) what is the 


input to the plant to force the error and the error rate to 
tend to zero? It is not possible in this case to examine the 
overall situation to assist in making the correct decision. 
To date this problem has been approached by assigning patterns 
to regions of the phase plane, deciding into which region a 
point falls, and making a decision on receipt of this pattern. 
It can be seen that a very small amount of information is 
presented to Adaline at each stage. A possible area of study 
would be to search for a better way of presenting the information 
about the point, and possibly about the desired final point, 
to Adaline at each stage. For the control problem studied 
the specific difficulties encountered are now discussed. 

As has been mentioned, Adaline found it difficult to 
reproduce the linear switching line of the controlled plant 
because of the coarseness of division of the phase plane. 


It would, however, require a very fine division for Adaline 
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to be_able to reproduce a line accurately. The resulting 

size of the Adaline would make practical realization, using 
memistors Ci] as the adaptive elements, prohibitively expensive 
even where only two states are coded. The polar coding scheme 
offers a partial solution to the problem of reproducing this 
particular switching line. There may also be cases in which 
the controlled plant has an unknown and complex switching 
line, which would be better tackled by a polar division 

than by rectilinear division. The problem of the coarseness 
of the division and the resulting undesirable features of 

the trajectory poses a difficult problem for which a solution 
is not immediately obvious. 

Directly associated with the problem of the coarseness 
of the divisions is the rapid increase of patterns, and the 
number of digits per pattern, with increasing fineness of 
the quantization. It was shown, experimentally, that 144 
patterns generated in the quantized phase plane, could be 
successfully separated using 24 weights and a threshold. 

The statement on the capacity of Adaline by Widrow [a |} and 
discussed in section 3.2 deals, presumably, with all possible 
patterns which can be generated by permutations of the digits 
applied to Adaline. In presenting Adaline with 144 patterns 
from the phase plane, only a small and carefully controlled 
Sample of the total number of patterns which can be obtained 
from the weuRIERETaHe of the 24 digits is used. 

In this study Adaline was trained on a plant which was 
known to perform well with a conventional controller. It did, 
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in fact, prove possible to train Adaline to act as a (rather 
crude) controller, but if this is to remain more than an academic 
exercise, the question must be asked if this idea can be applied 
to more complicated situations. If Adaline is to be considered 
seriously as a controller in some given situation, it must 
be able to control competently, and it must show a definite 
advantage over conventional controllers. There is one such 
area where Adaline might be used-- where the aim is to duplicate , 
the response of a human operator, or, more generally, where 
the controller must learn to control in the face of an absence 
of information about the dynamics of the plant and its 
existing controller. In the case of the human operator the 
main problem would be to present Adaline with the correct 
Signals. The operator can be considered as having a switching 
surface (or hypersurface) which depends on his past experience 
in acting on obsevations of many variables. It may prove 
possible to train Adaline to duplicate his performance so that 
Adaline could eileen take over the operator's task. A by= 
product of this process may be the identification of some of 
the characteristics of the controller by examining Adaline's 
weights at the end of ene trate cycle. 

The problem of training Adaline to act as the controller 
of an unknown and, as yet, uncontrolled plant is more difficult. 
A set of weights could be placed in Adaline, and the phase 
Space, with certain of the plant variables as its coordinate 


axes, could be quantized and coded. On starting the plant 
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from certain initial conditions, Adaline would produce a 
response and drive the plant. As the resulting trajectory 
passes through different coded regions, different responses 
would be produced which would force the plant to behave in 
a certain way. If the resulting trajectory is undesirable, 
Adaline must be adjusted in some way to produce a satisfactory 
trajectory. The main problem would be to decide how to devise 
a suitable training procedure. 3 
Although the possible uses for Adaline seem plausible, 
they depend on solving the problem of Adaline'’s poor response 
in certain areas of the phase plane. The schemes mentioned in 
sections 4.1 and 4.2 for improving Adaline's response near 
the origin do not attack the problem at its source, and an 
attempt should be made to improve the performance of Adaline 
itself. To date the information on the state of the plant 
has been presented to Adaline as a pattern after some form 
of coding. An area of research would be to consider more 
sophisticated ways of presenting Adaline with information 
about the state of the plant and, vossibly, presenting 


additional information about the desired final state. 
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